Genetic variants of MUC4 are associated with susceptibility to and mortality of colorectal cancer and exhibit synergistic effects with LDL-C levels

As a disease with high mortality and prevalence rates worldwide, colorectal cancer (CRC) has been thoroughly investigated. Mucins are involved in the induction of CRC and the regulation of intestinal homeostasis but a member of the mucin gene family MUC4 has a controversial role in CRC. MUC4 has been associated with either decreased susceptibility to or a worse prognosis of CRC. In our study, the multifunctional aspects of MUC4 were elucidated by genetic polymorphism analysis in a case-control study of 420 controls and 464 CRC patients. MUC4 rs1104760 A>G polymorphism had a protective effect on CRC risk (AG, AOR = 0.537; GG, AOR = 0.297; dominant model, AOR = 0.493; recessive model, AOR = 0.382) and MUC4 rs2688513 A>G was associated with an increased mortality rate of CRC (5 years, GG, adjusted HR = 6.496; recessive model, adjusted HR = 5.848). In addition, MUC4 rs1104760 A>G showed a high probability of being a potential biomarker for CRC patients with low-density lipoprotein cholesterol (LDL-C) in the risk range while showing a significant synergistic effect with the LDL-C level. This is the first study to indicate a significant association between MUC4 genetic polymorphisms and CRC prevalence, suggesting a functional genetic variant with the LDL-C level, for CRC prevention.


Introduction
Colorectal cancer (CRC) is the third most common cancer and the fourth leading cause of cancer-related death worldwide [1]. Because of its high prevalence and mortality rates, many researchers have studied the molecular mechanisms of CRC but the frequency of occurrence and death rate of CRC is still high, and its clinical treatment via surgery remains stagnant [2]. Therefore, new treatment concepts including microsatellite instability (MSI) and KRAS or BRAF mutations have become an area of focus for identifying the genetic causes of CRC and improving personalized medicine for CRC patients [3,4]. CRC is a highly prevalent malignancy with multifactorial etiology, which includes metabolic alterations as contributors to disease development. However, few studies have shown an association between genetic variants and metabolic factors due to the complexity of the correlations among them. Therefore, we examined the correlations between genetic variants of a well-known CRC-related gene, MUC4, and CRC prevalence with regard to metabolic factors.
Mucins are a family of molecules responsible for the protection, repair, and survival of epithelial tissue in the intestines [5]. They maintain the homeostasis and physiological environment of the gut by preventing the invasion of pathogens. The members of the mucin family genes have indicated an abnormal expression in CRC. MUC1 [6] and MUC13 [7] act as oncogenes, and MUC2 [8] and MUC6 [9] act as tumor suppressors. Interestingly, MUC4 also shows aberrant expression in many epithelial tissues including CRC, but the role of MUC4 in CRC is controversial because studies of MUC4 expression in CRC have shown conflicting results. While some studies have suggested that loss or reduction of MUC4 expression occurs in CRC [10], other studies have suggested that the majority (approximately 75%) of CRC tumors have a decreased level or loss of MUC4 expression, and a subset (approximately 25%) of CRC tumors have high MUC4 expression [11,12].
Although many previous studies have indicated aberrant MUC4 expression in CRC carcinogenesis, only one study showed an association between genetic polymorphisms of MUC4 and the prognosis of CRC [13][14][15]. Therefore, we analyzed single nucleotide polymorphisms (SNPs) of MUC4 in CRC patients in the Korean population. We selected MUC4 rs882605, rs1104760, and rs26885813 which were suggested to have a significant function on MUC4 expression in our previous paper [16], and selected rs2246901 that was associated with epithelial tumor [17]. We analyzed genetic associations between MUC4 SNPs and CRC prevalence as well as metabolic factors to determine potential biomarkers for CRC development. The present paper is the first study to suggest that genetic variants of MUC4 play important roles in the prevalence and prognosis of CRC while suggesting a significant synergistic effect between the low-density lipid cholesterol (LDL-C) level and MUC4 polymorphism against CRC prevalence. Considering that novel medical treatments are needed for CRC therapy, this study will provide a new perspective to initiate personalized medicine for the diagnosis and treatment of CRC.

Study population
This case-control study included 884 individuals enrolled between 1996 and 2009 and was reviewed and approved by the Institutional Review Board of CHA Bundang Medical Center (IRB No. 2009-08-077) on October 20 in 2009. Samples data were accessed from October 2021 to April 2022 for the purpose of our study. The 464 CRC patients were diagnosed at the CHA Bundang Medical Center (Seongnam, South Korea), had histologically proven adenocarcinoma, and had undergone surgical resection with curative intent. The patients included 260 colon cancer patients, 192 rectal cancer patients, and 12 consecutive patients with unclassified CRC who had undergone primary surgery. Tumor classification was conducted according to the tumor, node, and metastasis classification staging system of the 7th American Joint Committee on Cancer staging manual. Hypertension and diabetes mellitus were classified using the same criteria used in our previous study [18]. The controls included 424 individuals who were randomly selected from a health screening program, and participants with a history of thrombotic diseases or cancers were excluded. All participants were Korean and provided written informed consent. Our study followed the recommendations of the Declaration of Helsinki.

Genotyping
DNA was extracted from white blood cells using a G-DEX II Genomic DNA Extraction kit (iNtRON Biotechnology, South Korea). The SNPs of MUC4 were determined based on our previous studies and articles in the literature. TaqMan allele discrimination analysis was used to determine the genotypes, and the analysis protocol was the same as that used in our previous study [19]. We randomly chose about 10% of the samples to confirm the results and performed sequencing. The concordance between the experimental results and randomly repeated samples was 100%.

Statistical analysis
For comparisons of baseline characteristics between the CRC and control groups, chi-square tests and Student's t-tests were used to assess categorical and continuous data, respectively. All genotype frequencies of polymorphisms were in Hardy-Weinberg equilibrium (HWE) for both controls and patients, and these polymorphisms were analyzed in reference to the wild-type genotype. Multivariate logistic regression was applied to estimate the association of MUC4 polymorphisms with CRC occurrence using adjusted odds ratios (AORs) and 95% confidence intervals (CIs) that were adjusted for age, gender, hypertension, diabetes mellitus, body mass index (BMI), and HDL-C levels. These adjustment variables were selected because they are risk factors for metabolic syndrome that affects CRC. Additionally, ROC curve analysis was conducted to assess the relationship between genetic polymorphisms and disease status, and subgroup analyses were performed for a range of environmental factors. An area under the curve (AUC) of approximately 1.00 indicated that a variable was a precise biomarker for CRC, while an AUC of 0.50 indicated that the variable was not an accurate biomarker. In general, an AUC greater than 0.60 indicated that a variant was a significant biomarker for the disease. The associations between clinical characteristics and genetic polymorphisms were assessed using an analysis of variance.
The effects of correlations between environmental factors and genetic variants on CRC were analyzed via interaction analyses and stratified analyses. Spearman correlation analysis was also conducted to show the effects of correlations between lipid-related factors, such as the correlation between HDL-C and LDL-C, after adjustment for age, sex, and BMI. Survival analysis was implemented using the Cox proportional hazards model. Survival was measured using the same method used in our previous study [20]. The results were adjusted for age, sex, presence of hypertension, presence of diabetes mellitus, tumor size, tumor differentiation, chemotherapy, smoking, and alcohol use. We excluded 100 CRC patients who had an insufficient medical history. Overall survival was defined as the time from surgery to death or the final follow-up, and relapse-free survival was defined as the time from surgery to cancer relapse or the final follow-up. Participants were followed for a median of 34 months (range, 4-173 months). Hazard ratios (HRs) are presented with 95% CIs. All analyses were conducted using Medcalc version 12.7.1.0 (Medcalc Software, Mariakerke, Belgium) and GraphPad Prism 4.0 (GraphPad Software Inc., San Diego, CA, USA). P-values < 0.05 were regarded as significant, and the false discovery rate (FDR) method was used to estimate the overall experimental error rate. The FDR method provides a measure of the expected proportion of false positives among data; therefore, FDR-P < 0.05 means more powerful statistical significance.

Comparison of baseline characteristics between CRC patients and controls
The baseline characteristics were evaluated in controls and colorectal, colon, and rectal cancer patients (Table 1). Before analysis, the chi-square test and t-test were conducted to adjust the age and sex of the control group according to the age and sex of the case group. We randomly matched controls to cases and confirmed that age and sex were matched between patients and controls by showing a P-value > 0.05. Regarding lipid level-related factors, a difference between controls and each of the three types of cancer patients was statistically significant, except for the folate and triglyceride levels of rectal cancer patients. The levels of lipid-related factors were significantly lower in all types of cancer patients than those in healthy subjects.

MUC4 rs1104760 A>G and rs2688513 A>G polymorphisms are associated with decreased susceptibility to CRC
The effects of four MUC4 polymorphisms (rs882605 G>T, rs1104760 A>G, rs2688513 A>G, and rs2246901 A>C) on CRC risk were evaluated and age, sex, hypertension, diabetes mellitus, body mass index (BMI), and high-density lipoprotein cholesterol (HDL-C) levels were adjusted ( Table 2). MUC4 rs1104760 G allele had a protective effect against CRC occurrence compared to the A allele (AG, AOR = 0.537, P = 0.010, FDR-P = 0.040; GG, AOR = 0.297, P = 0.008, FDR-P = 0.032; AA vs AG+GG, AOR = 0.493, P = 0.002, FDR-P = 0.008; AA+AG vs GG, AOR = 0.382, P = 0.027, FDR-P = 0.108). When cases were classified by tumor location as colon cancer or rectal cancer, the majority of the findings remained statistically significant (in colon cancer: AG, AOR = 0.433, 95% CI = 0.251-0.747, P = 0.003, FDR-P = 0.012; GG,  Notably, the association between colon cancer prevalence and the heterozygous genotype and the dominant model, respectively, remained significant after the FDR test. MUC4 rs2688513 AG genotype was significantly less frequent in the colon cancer group, although its significance was not maintained in the FDR-P test. However, the frequencies of MUC4 rs882605 G>T and rs2246901 A>C polymorphisms did not show a remarkable association with CRC susceptibility. Moreover, because MSI is closely related to CRC and MSI-high status is an emerging predictive and prognostic biomarker for the immunotherapy response in cancer [21], we measured the associations of MUC4 polymorphisms with MSI status ( Table 3). The MUC4 rs1104760 AG and GG genotypes and the dominant model had a protective effect in MSI patients and the dominant model maintained a significant P-value after adjusting for the FDR. Interestingly, in MSI-high-status patients, both the heterozygous genotype and dominant model of MUC4 rs882605, rs1104760, and rs2688513 indicated a significant association with CRC occurrence while all MUC4 polymorphisms did not show significance in MSI-low status.

Correlation between the MUC4 rs1104760 A>G and the HDL-C and LDL-C concentrations regarding susceptibility to CRC
As CRC is a complex disease affected by various environmental factors, the synergistic effects of clinical parameters and MUC4 polymorphisms for CRC risk were assessed by performing stratified analysis and interaction analysis ( Table 4, S1 Table). Interestingly, the MUC4 rs1104760 AA variant exhibited a stronger synergistic effect with LDL-C levels than did the GG+AG variant. When the MUC4 rs1104760 AA variant was combined with LDL-C levels in the risk range, CRC occurrence was increased approximately 5-fold compared with that in patients with this variant and LDL-C levels in the normal range (Table 4). In the interaction analysis, the HDL-C levels, which are closely related to LDL-C levels, had synergistic effects with the MUC4 rs1104760 AA genotype, showing significantly increased CRC risk when combined with the AA genotype (Fig 1). In addition, the LDL-C and HDL-C levels showed a positive correlation in a partial Spearman correlation analysis (ρ = 0.244) although it was a weak correlation. Furthermore, in a receiver operating characteristic (ROC) curve analysis, MUC4 rs1104760 A>G was indicated as a possible biomarker for CRC patients with high LDL-C levels (area under the curve (AUC) = 0.689) compared to those with normal LDL-C levels (AUC = 0.603) (Fig 2).

Combined effects of MUC4 polymorphisms on the occurrence of CRC
To identify the combined effects of four MUC4 polymorphisms on CRC susceptibility, we analyzed haplotype and genotype combinations. The G-G-A-A assembly (MUC4 rs882605 G>T/  rs1104760 A>G/rs2688513 A>G/rs2246901 A>C) was associated with a decreased CRC prevalence compared to the reference assembly (AOR = 0.286, 95% CI: 0.151-0.539, P < 0.0001, FDR-P = 0.001), and its several subset combinations were also associated with decreased CRC occurrence compared to each reference assembly ( Table 5, S2 Table). Interestingly, among the subsets, combinations that include the rs1104760 G allele had a significant impact on CRC risk (rs882605 G/rs1104760 G/rs2688513 A, OR = 0.313, P < 0.0001, FDR-P = 0.001; rs882605 G/rs1104760 G/ rs2246901 A, OR = 0.285, P < 0.0001, FDR-P = 0.001; rs1104760 G/rs2688513 A/rs2246901 A, OR = 0.309, P < 0.0001, FDR-P = 0.001; rs882605 G/rs1104760 G, OR = 0.370, P < 0.0001, FDR-P = 0.0003; rs1104760 G/rs2688513 A, OR = 0.369, P = 0.0001; FDR-P = 0.0003; rs1104760 G/ rs2246901 A, OR = 0.316, P < 0.0001, FDR-P = 0.0003) when setting a combination of each  major alleles as a reference. Additionally, the combination of the MUC4 rs882605 T allele and rs1104760 A allele, which is not a subset of the G-G-A-A assembly, was associated with decreased CRC prevalence (OR = 0.354, 95% CI: 0.146-0.858, P < 0.016, FDR-P = 0.024). All significant allele combinations maintained significant P-values after the FDR-P test. Genotype combination analysis showed a similar pattern to that of haplotype analysis (S2 and S3 Tables). Most combinations that showed significant associations with decreased CRC risk were composed of the G-G-A-A allele assembly, which showed a significant effect ( Table 5). The two combinations that did not contain significant alleles were associated only with reduced susceptibility to colon cancer. Interestingly, the common genotypes of the two combinations included the AA genotype of rs1104760, including AA/AG of rs1104760 and rs2688513 (AOR = 0.111, 95% CI: 0.016-0.763, P = 0.025, FDR-P = 0.040) and AA/AC of rs1104760 and rs2246901 (AOR = 0.149, 95% CI: 0.033-0.675, P = 0.014, FDR-P = 0.037).
Additionally, linkage disequilibrium (LD) block analysis was conducted to measure LD between polymorphisms (S2 Fig). Strong linkage disequilibrium was observed between each single nucleotide polymorphism (SNP), and the strongest correlation was found between MUC4 rs2246901 A>C and rs2688513 A>G (R 2 = 0.88), but an LD block was not found.

Discussion
CRC is closely associated with various genetic conditions related to alterations in intestinal homeostasis that allow its carcinogenesis to proceed [22]. Since the accumulation of genetic mutations can lead to cancer development, several genetic polymorphisms were correlated  with CRC risk in genome-wide association studies [23,24]. One representative CRC-related gene, MUC4, has been reported to show aberrant expression in CRC patients, but only one study suggested a significant role of MUC4 SNPs in CRC progression [13]. In this study, we analyzed the associations between MUC4 rs882605 G>T, rs1104760 A>G, rs2688513 A>G, and rs2246901 A>C polymorphisms and CRC prevalence and prognosis to elucidate the multifunctional aspects of MUC4 genetic polymorphisms. CRC development is associated with both genetic and environmental factors. In particular, obesity is firmly established as a significant risk factor for CRC development, and dyslipidemia is a well-known obesity-related metabolic feature [5]. The role of lipid alterations in CRC etiology is not clear but HDL-C and LDL-C levels are considered to be correlated with CRC development. HDL-C has anti-inflammatory and immunomodulatory activities and reduces the risk of CRC [25]. This lipoprotein prevents the conversion of macrophages to the proinflammatory M1 phenotype, thus decreasing the pro-inflammatory milieu, which can lead to cancer by increasing the irregularities in the intestine [26]. In contrast, LDL-C is suggested to induce inflammation which can increase the risk of CRC, although the mechanism remains unclear. LDL-C is hypothesized to promote cholesterol accumulation and enhance inflammation, inducing atherosclerosis, which is highly associated with CRC and shares common risk factors [27][28][29]. In addition, increased metastasis was associated with a high LDL-C level [30]. Mucin glycoproteins may be correlated with lipoproteins through their involvement in inflammation. As the major macromolecular components of mucus, mucin glycoproteins can regulate inflammation in the intestine, which can damage the mucus barrier, worsen mucus quality, and reduce mucus production [31,32]. Consistent with previous results, our results showed a high correlation between the MUC4 rs1104760 A>G polymorphism and lipoproteins. MUC4 rs1104760 AA variant had a synergistic effect with LDL-C levels, exhibiting fivefold higher CRC risk when combined with LDL-C levels in the risk range compared with AG +GG variants in individuals with LDL-C levels in the normal range (Table 4). Moreover, the MUC4 rs1104760 AA variant exhibited an approximately four-fold increased risk of CRC when combined with HDL-C levels in the risk range compared with the AG+GG variant in individuals with HDL-C levels in the normal range (Fig 1). Furthermore, ANOVA showed that the MUC4 rs1104760 AA variant was associated with significantly higher LDL-C levels than those found with the AG and GG variants (S1 Fig). As a result, we suggest that MUC4 rs1104760 A>G is a functional SNP in MUC4, and its AA genotype affects LDL-C levels and inflammation, inducing CRC development.
The MUC4 rs1104760 A>G polymorphism was associated with CRC occurrence without combination with metabolic factors. In the genetic association analysis, its G allele had a protective tendency for CRC risk compared with that of the A allele, which was associated with high LDL-C levels. Consistent with previous studies of reduced MUC4 expression in CRC patients [33,34], our results elucidated a protective role of MUC4 in CRC patients according to its SNPs. Furthermore, we found that the MUC4 rs1104760 A>G variant combined with LDL-C levels in the risk range showed greater predictive value for CRC occurrence than the same variant combined with LDL-C levels in the control range through ROC curve analysis, the representative diagnostic test (Fig 2). However, the exact mechanism by which the MUC4 rs1104760 A>G affects inflammation and LDL-C levels and contributes to CRC development should be further studied.
Interestingly, we also showed an aggressive role of MUC4 regarding CRC prognosis, which helped to elucidate the controversial role of MUC4 in CRC patients. In the survival analysis, the MUC4 rs2688513 GG variant was associated with a poor prognosis of CRC compared with the AA and AA +AG variants (Fig 3). This result is consistent with previous findings indicating that MUC4 was overexpressed in a subset of CRC patients with a worse prognosis [11,12]. MUC4 contains three epidermal growth factor (EGF) domains, which is a common mitogenic factor that stimulates the proliferation of different cell types, especially fibroblasts and epithelial cells [30]. MUC4 may act as an intramembrane ligand for the receptor tyrosine kinase ErbB2 and perform an anti-apoptotic function to promote tumor progression [13,34,35]. Additionally, we showed that the G allele was not associated with decreased CRC risk in the haplotype combination analysis, while its A allele was significantly associated with decreased CRC risk when combined with the A allele of the MUC4 rs1104760 polymorphism (Table 5,  S2 Table). Therefore, we suggest that MUC4 has a significant effect on worsening the prognosis of CRC when the GG genotype of MUC4 2688513 polymorphism is present.
As missense variants located on the second exon, MUC4 rs1104760 A>G and 2688513 A>G polymorphisms change the second and first base of the codon, converting isoleucine (ATC) to threonine (ACC) and serine (TCA) to proline (CCA), respectively [16]. Thus, both polymorphisms are highly likely to alter the gene function and be significantly associated with the prevalence and prognosis of CRC. Our results are consistent with those of previous studies in which underexpression of MUC4 was reported in most CRC patients while overexpression of MUC4 was reported in a subset of CRC patients with a poor prognosis. Therefore, we suggest the MUC4 rs1104760 A>G polymorphism as a novel biomarker for CRC treatment because it is correlated with LDL-C levels and may affect inflammation in the intestine, thus inducing CRC development. Additionally, we suggest the MUC4 2688513 A>G polymorphism as a prognostic marker of a poor CRC prognosis although further research is needed to determine the correlation between the MUC4 2688513 A>G polymorphism and EGF domains.
In the medical treatment of CRC, a trend shifting from surgery as the main mode of treatment to personalized treatments for individual care has developed since CRC is a complicated disease that occurs with the accumulation of genetic mutations and changes in epigenetic factors [36]. Genetic variants can be useful for personalized cancer treatment by predicting the impact of each allele on disease development or prognosis. To our knowledge, this study is the first to suggest MUC4 polymorphisms as possible biomarkers for CRC risk while considering related metabolic factors. MUC4 rs1104760 A>G may be a predictor for individual susceptibility to CRC, and MUC4 rs2688513 A>G may be a prognostic marker. By applying these concepts to clinical measures, it will be possible to distinguish whether patients require stronger preventative measures for CRC. However, there are some limitations to our study. First, the exact mechanisms of MUC4 were not confirmed. Although the present study showed a statistically significant association between LDL-C and MUC4 rs1104760 A>G, a clear explanation was not available. However, MUC4 rs1104760 A>G has a strong potential for use as a biomarker because a sensitivity analysis (ROC analysis) indicated its significance in predicting CRC. Second, the study subjects were limited to a small sample size recruited in one hospital but our studies satisfied HWE. Further studies should include more patients to establish MUC4 SNPs as biomarkers.

Conclusion
We investigated MUC4 rs882605 G>T, rs1104760 A>G, rs2688513 A>G, and rs2246901 A>C variants in controls and CRC patients and showed their association with susceptibility to and prognosis of CRC. In particular, MUC4 rs1104760 mutant allele had a protective effect against CRC prevalence compared to the wild allele. Furthermore, MUC4 rs1104760 A>G had a strong correlation with LDL-C with regard to CRC risk and had a predictive value in CRC patients with LDL-C levels in the risk range. Based on the effect of LDL-C on inflammation, which leads to CRC development, we suggest that the MUC4 rs1104760 A>G plays a substantial role in CRC pathology via the inflammatory processes related to LDL-C. In addition, as the MUC4 rs2688513 mutant genotype was associated with a worse prognosis of CRC compared with the wild genotype, we suggest that the MUC4 rs2688513 A>G polymorphism is a prospective marker for CRC progression. This is the first study to elucidate the multifunctional role of MUC4 in CRC patients while considering metabolic factors. Regarding the recent medical focus on personalized treatment, our results provide a significant cornerstone for further studies aimed to utilize MUC4 polymorphisms as individualized factors for CRC treatment and early diagnosis.
Supporting information S1 Fig. Association between LDL-C level and MUC4 rs1104760 A>G and MUC4 rs2688513 A>G. In both MUC4 rs1104760 A>G and MUC4 rs2688513 A>G, wild genotypes have a higher mean of LDL-C concentration, and mutant genotypes have a lower mean of LDL-C concentration. All p-values were statistically significant (rs1104760 A>G, P = 0.037; rs2688513 A>G, P = 0.027). (DOCX) S2 Fig. Linkage disequilibrium plots for four SNPs of the MUC4 gene using Haploview software. LD was calculated using D' and R 2 values by performing the Haploview software which shows linkage disequilibrium between the SNPs in the LD plot. D is the coefficient of LD and R 2 is the squared correlation. The number in the block denotes LD calculated using R 2 ; a higher number means high LD. The colored squares show the strength of LD; red means high LD, pink means moderate LD, and white means low LD. However, the combination specified by the block was not found. (DOCX) S1